Focused Web Crawling for E-Learning Content
نویسنده
چکیده
The work describes the design of the focused crawler for Intinno, an intelligent web based content management system. Intinno system aims to circumvent the drawbacks of existing learning management systems in terms of scarcity of content which often leads to the cold start problem. The scarcity problem is solved by using a focused crawler to mine educational content from the web. Educational content is mined from University websites in the form of course pages. We present a survey of various probabilistic models such as Hidden Markov Models(HMMs) and Conditional Random Fields(CRFs) for building a focused crawler and finally we describe the design of the system by applying CRFs.
منابع مشابه
Semantic Focused Crawling for Retrieving E-Commerce Information
Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of topics without downloading all pages of the Web. With the rapid growth of the E-commerce, how to discovery the specific information such as about buyer, seller and products etc. adapting for the online business user becomes a focused issue to the information search engine. We present a novel sema...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملStumbleUpon Evergreen Classification Challenge (Website Classification Problem)
Web classification is a very important machine learning problem with wide applicability in tasks such as news classification, content prioritization, focused crawling and sentiment analysis of web content. In this project, we primarily focus on developing prediction model using machine learning techniques for one such problem that classifies if a web posting is of eternal relevance, known as ev...
متن کاملExploiting Multiple Features with MEMMs for Focused Web Crawling
Focused web crawling traverses theWeb to collect documents on a specific topic. This is not an easy task, since focused crawlers need to identify the next most promising link to follow based on the topic and the content and links of previously crawled pages. In this paper, we present a framework based on Maximum Entropy Markov Models(MEMMs) for an enhanced focused web crawler to take advantage ...
متن کاملA New Approach Towards Vertical Search Engines - Intelligent Focused Crawling and Multilingual Semantic Techniques
Search engines typically consist of a crawler which traverses the web retrieving documents and a search frontend which provides the user interface to the acquired information. Focused crawlers refine the crawler by intelligently directing it to predefined topic areas. The evolution of search engines today is expedited by supplying more search capabilities such as a search for metadata as well a...
متن کامل